Vertex Degree Distribution for the Graph of Word Co-Occurrences in Russian
نویسندگان
چکیده
Degree distributions for word forms cooccurrences for large Russian text collections are obtained. Two power laws fit the distributions pretty good, thus supporting Dorogovtsev-Mendes model for Russian. Few different Russian text collections were studied, and statistical errors are shown to be negligible. The model exponents for Russian are found to differ from those for English, the difference probably being due to the difference in the collections structure. On the contrary, the estimated size of the supposed kernel lexicon appeared to be almost the same for the both languages, thus supporting the idea of importance of word forms for a perceptual lexicon of a human.
منابع مشابه
A New Document Embedding Method for News Classification
Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...
متن کاملSome Results on Forgotten Topological Coindex
The forgotten topological coindex (also called Lanzhou index) is defined for a simple connected graph G as the sum of the terms du2+dv2 over all non-adjacent vertex pairs uv of G, where du denotes the degree of the vertex u in G. In this paper, we present some inequalit...
متن کاملOn discriminativity of vertex-degree-based indices
A recently published paper [T. Došlić, this journal 3 (2012) 25-34] considers the Zagreb indices of benzenoid systems, and points out their low discriminativity. We show that analogous results hold for a variety of vertex-degree-based molecular structure descriptors that are being studied in contemporary mathematical chemistry. We also show that these results are straightforwardly obtained by u...
متن کاملSplice Graphs and their Vertex-Degree-Based Invariants
Let G_1 and G_2 be simple connected graphs with disjoint vertex sets V(G_1) and V(G_2), respectively. For given vertices a_1in V(G_1) and a_2in V(G_2), a splice of G_1 and G_2 by vertices a_1 and a_2 is defined by identifying the vertices a_1 and a_2 in the union of G_1 and G_2. In this paper, we present exact formulas for computing some vertex-degree-based graph invariants of splice of graphs.
متن کاملیک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجرههای همپوشان
A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...
متن کامل